% 

AD-A103  852  WISCONSIN  UNIV-MADISON  MATHEMATICS  RESEARCH  CENTER  F/G  12/1  \ 

AN  EXAMPLE  OF  TmE  USE  OF  ANDREWS*  PLOTS  TO  DETECT  TIME  VARIATIO— ETC (UJ 
JUL  81  A  M  HERZBERG  DAAG29-80-C-0041 

UNCLASSIFIED  MRC-TSR-2239  NL 


END 
0 8' 


MRC  Technical  Summary  Report  #2239 v 


N 

10 

00 

CO 

© 


AN  EXAMPLE  OF  THE  USE  OF  ANDREWS'  PLOTS 
TO  DETECT  TIME  VARIATIONS  IN  MODEL 
PARAMETERS  AND  OUTLYING  OBSERVATIONS 

Agnes  M.  Herzberg 


Mathematics  Research  Center 
University  of  Wisconsin— Madison 
610  Walnut  Street 
Madison,  Wisconsin  53706 

July  1981 


(Received  June  23,  1981) 


k 


«■-  fc.  C  ; 


SEP  8  1981^ 


fi 


A 


Approved  for  public  reloose 
Distribution  unlimited 


^Sponsored  by 


s.  Army  Research  Office 
0.  Box  12211 


g  Research  Triangle  Park 
Jjprth  Carolina  27709 


81  9  08  040 


Ac e ess i of. 

V<pr^  f ,  \Xtf.  *- 

T*  •  'i 

.  1 
-  -  <  ..  .  ».  J. 

AN  EXAMPLE  OF  THE  USE  OF  ANDREWS'  PLOTS  TO  DETECT  TIME  _ 

VARIATIONS  IN  MODEL  PARAMETERS  AND  OUTLYING  OBSERVATIONS 

Agnes  M.  Herzberg* 

Technical  Summary  Report  #2239 
July  1981 

ABSTRACT 

Andrews  (1972)  introduced  a  method  of  plotting  high-dimensional  data  in 
two  dimensions*  This  method  is  exploited  as  a  graphical  tool  for  the 
examination  of  changes  over  time  in  the  parameters  of  a  time  series  model.  An 
example  using  a  Fourier  series  model  is  given  to  illustrate  the  method.  It  is 
also  shown  how  outlying  observations  in  the  data  can  be  found. 


UNIVERSITY  OF  WISCONSIN-MADISON 
MATHEMATICS  RESEARCH  CENTER 


AMS  (MOS)  Subject  Classifications:  62M10,  62H30 

Key  Words:  Andrews'  plots,  time  series,  outliers,  spurious  observations, 
exploratory  analysis. 

Work  Unit  Number  4  (Statistics  and  Probability) 


^Imperial  College  of  Science  and  Technology,  University  of  London, 
London,  U.  K. 


Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041 


SIGNIFICANCE  AND  EXPLANATION 


Andrews  (1972)  introduced  a  method  of  plotting  high-dimensional  data  in 

t 

two  dimensions.  In  his  method,  Andrews  represents  each  multidimensional  point 
by  a  Fourier  function.  The  clustering  of  plots  of  these  functions  is 
equivalent  to  the  clustering  of  the  multidimensional  points.  Andrews'  method 
is  exploited  as  a  graphical  tool  for  exploratory  data  analysis  for  the 
examination  of  changes  over  time  in  the  parameters  of  a  time  series  model .  An 
example  using  the  total  Canadian  unemployment  figures  from  1956-1975  is  used 
to  illustrate  the  method.  These  data  have  four  spurious  (outlying)  obser¬ 
vations  and  it  is  shown  how  these  may  be  detected  by  the  use  of  Andrews' 
plots  * 


i 

The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


•UWM" nr-^vi. ■_»? ' 


AN  EXAMPLE  OP  THE  USE  OP  ANDREWS'  PLOTS  TO  DETECT  TIME 
VARIATIONS  IN  MODEL  PARAMETERS  AND  OUTLYING  OBSERVATIONS 


Agnes  M.  Herzberg7 


1  •  Introduction 


A  graphical  method  is  given  for  the  examination  of  changes  over  time  in 
the  parameters  of  a  time  series  model*  This  method  can  be  used  as  an  aid  in 
exploratory  data  analysis.  In  a  previous  paper,  Herzberg  and  Hickie  (1981), 
the  method  is  presented  and  two  examples  are  given.  A  brief  description  of 
various  multivariate  graphical  clustering  methods  and  the  use  of  Andrews' 
plots  as  a  graphical  tool  in  time  series  analysis  is  also  given  in  Herzberg 
(1981).  Here  another  model  is  used  with  a  different  set  of  data  and  further 
discussion  given  of  the  detection  of  outliers,  or  spurious  observations. 


2.  Andrews'  Plots 

Andrews  (1972)  proposed  the  following  simple  and  useful  method  of 

plotting  high-dimensional  data  in  two  dimensions.  If  the  data  are 

m-dimensional,  each  point  x  *  (x  ,...,x  ),  where  x.  (i  ■  1,...,m) 

i  n  a 

are  the  measured  variables,  is  represented  by  the  function 


V*  ‘  x> 


-  'A 

•2  2  +  x.sin  t  +  x_  cos  t  + 


x  sin  2t  +  x_co8  2t  +  ...  (1) 

4  5 


plotted  over  the  range  <  t  <  r.  The  functions  given  by  (1)  have  several 
properties  including  the  preservation  of  means,  distances  and  variances  and 
will  also  give  one-dimensional  projections.  Thus,  when  (1)  is  plotted  for 
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each  data  point  jc,  the  clustering  of  the  points  may  be  seen  by  a  banding 
together  of  the  plots  of  the  functions.  Tests  of  significance  may  also  be 
made;  see  Herzberg  and  Hickie  (1981). 


3.  Variation  of  Model  Parameters 

Herzberg  and  Hickie  (1981)  considered  the  regression  model 


Xj  -  +  Uj  <3  -  1 » • » » /T-n+1 ) , 


where  T  is  the  total  number  of  observations,  n  is  the  number  of 

observations  in  each  subgroup  of  observations  used  for  estimating  the  unknown 

parameters,  ^  *  ^y1 j ' *  *  * ,ynj^'  is  ®  n  *  1  vector,  yA  being  the  ith 

observation  in  the  jth  subgroup  (i  ■  1,...,n),  X  is  the  n  *  m  matrix  of 

the  regressors,  (3^  is  the  m  *  1  vector  of  unknown  parameters  to  be 

estimated  by  least  squares  and  U ^  is  the  n  *  1  vector  of  error  terms.  All 

the  elements  of  the  's  are  assumed  to  be  independent  and  normally 

2 

distributed  with  mean  0  and  variance  0  .  It  is  assumed  that  the  T 
observations  are  taken  sequentially  over  time  and  it  is  desired  to  examine  the 
variation  in  the  over  time. 

A  A  A  | 

Let  6.  *  (04.,.*.,0  .)  be  the  m  *  1  vector  of  least  squares  estimates 
3  'j  ®3 

of  the  elements  of  obtained  from  the  jth  set  of  n  observations 

A  A 

(n  <  T),  i.e.  is  estimated  from  the  first  n  observations,  is 

estimated  from  the  second  observation  to  the  (n  +  1)st  observation,  etc. 

Prom  each  f. ,  a  plot  of  the  function  fg  (t),  defined  in  (1),  over  the 
3 

range  -z  <  t  <  *  was  made.  The  plots  of  these  functions  will  show  the 
change  over  time  in  the  vector  of  coefficients  The  plots,  f|  (t),  can 

be  considered  as  a  graphical  weighted  moving  average.  For  each  t  a 
different  weighting  is  given  to  the  observations. 


4.  An  Example 

Table  1  shows  the  total  Canadian  unemployment  figures  from  January  1956 
to  December  1975.  It  can  be  seen  that  the  values  for  January  1958,  1961,  1971 
and  1975  could  be  considered  as  being  outliers  or  spurious  observations  in  the 
data.  Figure  1  gives  a  plot  of  these  data. 

The  model 

E(y3+i-1>  =  ^  +  *2jSini?  +  V08??  +  Vini?  +  B5jco8i?  (2) 

(i  -  1, « • « , 12f  j  -  1 , . . . ,229)  , 

where  y^+i  1  i*  the  observed  unemployment  figure  in  month  j+i-1, 

was  fitted  to  the  data  by  least  squares  for  each  j  fixed  and 

8.  -  .,8,.  .,8_  ,  the  least  squares  estimate  of  8.  obtained. 

TJ  'J  2j  3j  4  j  53  ^3 

The  plots  of  the  function 

f»  (t)  -  8.  *2  ^  +  £_.cos  t  +  <L,si n  t  +  /J  .cos  2t  +  /L.sin  2t 

IL  23  33  43  53 

3  (3) 

(3  -  1 , . • . , 229 )  , 

were  obtained  and  plotted.  Note  that  (3)  differs  from  (1)  but  the 
mathematical  properties  of  (1)  are  retained.  Several  variations  of  (1)  were 
tried  but  the  outlying  plots  were  most  easily  seen  when  (3)  was  used.  This  is 
due  to  the  particular  weighting  which  (3)  gives  to  the  l^'s.  and  thus  to 
the  individual  observations. 

It  could  be  seen  from  the  plots  when  plotted  in  chronological  order  on  a 
graphics  terminal  that  certain  ones  stood  out  from  the  others.  Any  long  term 
increases  or  decreases  in  the  plots  were  also  noted. 

The  229  Andrews'  plots  are  given  in  Fig.  2  and  Fig.  3.  The  plots  in 
Fig.  2.k  (k  “  1,...,12)  are  those  obtained  from  (3)  for  j  M  k,  k  ♦  12, 


k  +  24,..., k  +  108.  The  plots  in  Fig.  2.k  are  similar  except  for  the  ones 
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PIG.  2.  Andrews'  plots,  (t)  (j  =  1,...,120)  given  by  (3),  S.  obtained  from  (2)  and  sorted 


denoted  by  a  thicker  line.  These  are  the  ones  whose  coefficients  are 


estimated  from  January  1958  or  1961.  The  plots  in  Fig.  3.k  (k  =  1,...,12) 
are  those  obtained  from  (3)  for  j  =  k  +  120,  k  +  132,. ...k  +  228  (j  <  229). 
The  plots  in  Fig.  3.k  are  similar  except  for  the  ones  denoted  by  a  thicker 
line.  These  are  the  ones  whose  coefficients  are  estimated  from  January  1971 
or  1975. 

Thus  Andrews'  plots  can  be  used  as  a  graphical  method  not  only  to  examine 
changes  over  time  in  the  parameters  but  also  to  detect  abrupt  changes  in  the 
observations  reflected  by  changes  in  the  parameters  of  the  model  over  time. 

As  mentioned  elsewhere,  the  Andrews'  plots  can  also  be  used  to  determine  the 
period  length  when  this  is  unknown. 
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